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Abstract 

Prolog is an excellent tool for representing and manipulating data written in formal lan- 
guages as well as natural language. Its safe semantics and automatic memory management 
make it a prime candidate for programming robust Web services. 

Where Prolog is commonly seen as a component in a Web application that is either 
embedded or communicates using a proprietary protocol, we propose an architecture where 
Prolog communicates to other components in a Web application using the standard HTTP 
protocol. By avoiding embedding in external Web servers development and deployment 
become much easier. To support this architecture, in addition to the transfer protocol, we 
must also support parsing, representing and generating the key Web document types such 
as HTML, XML and RDF. 

This paper motivates the design decisions in the libraries and extensions to Prolog for 
handling Web documents and protocols. The design has been guided by the requirement 
to handle large documents ciEciently. The described libraries support a wide range of Web 
applications ranging from HTML and XML documents to Semantic Web RDF processing. 

The benefits of using Prolog for Web related tasks is illustrated using three case studies. 

KEYWORDS: Prolog, HTTP, HTML, XML, RDF, DOM, Semantic Web 



1 Introduction 

The Web is an exciting place offering new opportunities to artificial intelligence, 

natural language processing and Logic Programming. Information extraction from 
the Web, reasoning in Web applications and the Semantic Web are just a few 
examples. We have deployed Prolog in Web related tasks over a long period. As 
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most of the development on SWI-Prolog takes place in the context of projects that 
require new features, the system and its libraries provide extensive support for Web 
programming. 

There arc two views on deploying Prolog for Web related tasks. In the most 
commonly used view, Prolog acts as an embedded component in a general Web 
processing environment. In this role it generally provides reasoning tasks such as 
searching or configuration within constraints. Alternatively, Prolog itself can act as 
a stand-alone HTTP server as also proposed by ECLiPSe (Leth et al. 1996). In this 
view it is a component that can be part of any of the layers of the popular three-tier 
architecture for Web applications. Components generally exchange XML if used as 
part of the backend or middleware services and HTML if used in the presentation 
layer. 

The latter view is in our vision more attractive. Using HTTP and XML over 
HTTP, the service is cleanly isolated using standard protocols rather than pro- 
prietary communication. Running as a stand-alone applic;ation, the attractive in- 
teractive development nature of Prolog can be maintained much more easily than 
embedded in a C, C-|— 1-, Java or C# application. Using HTTP, automatic testing of 
the Prolog components can be done using any Web oriented test framework. HTTP 
allows Prolog to be deployed in any part of the service architecture, including the 
realisation of complete Web applications in one or more Prolog processes. 

When deploying Prolog in a Web application using HTTP, we must not only 
implement the HTTP transfer protocol, but also support parsing, representing and 
generating the important document types used on the Web, especially HTML, XML 
and RDF. Note that, being widely used open standards, supporting these document 
types is also valuable outside the context of Web applications. 

This paper gives an overview of the Web infrastructure we have realised. Given 
the range of libraries and Prolog extensions that facilitate Web applications we 
cannot describe them in detail. Details on the library interfaces can be found in 
the manuals available from the SWI-Prolog Web site.-'^ Details on the implemen- 
tation axe available in the source distribution. The aim of this paper is to give an 
overview of the required infrastructure to use Prolog for realizing Web applications 
where we concentrate on scalability and performance. We describe our decisions for 
representing Web documents in Prolog and outline the interfaces provided by our 
libraries. 

The benefits of using Prolog for Web related tasks are illustrated using three case 

studies: 1) SeRQL, an RDF query language for meta data management, retrieval 
and reasoning; 2) XDIG, an extended Description Logic interface, which provides 
ontology management and reasoning by processing DIG XML documents and com- 
municating to external DL reasoners; and 3) A fac;eted browser on Semantic Web 
databases integrating meta-data from multiple collections of art- works. This case 
study serves as a complete Semantic Web application serving the end-user. 

This paper is organized as follows. Section 2 to section 4 describe reading, writing 

^ http://www.swi-prolog.org 
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{document) 
(content) 

(element) 

(attribute) 

(pi) 

(sdata) 

(ndata) 

(cdata), (name) 

(value) 

(svalue) 



= list-of (content) 

:= (element) \ (pi) \ (cdata) \ (sdata) \ (ndata) 

■— element ((fag), list-of (attribute), list-of (content)) 

:= (name) = (value) 

:= pi{(atom)) 

■— sdata((atom)) 

= ndata(( atom)) 

:= (atom) 

:= (svalue) \ list-of (svalue) 

:= (atom) \ (number) 



Fig. 1. SGML/XML tree representation in Prolog. The notation list-of (x) describes a 

Prolog list of terms of type (x). 



and representation of Web related documents. Section 5 describes our HTTP client 
and server libraries. Section 6 describes extensions to the Prolog language that 
facilitate use in Web applications. Section 7 to section 9 describe the case studies. 

2 Parsing and representing XML and HTML documents 

The core of the Web is formed by document standards and exchange protocols. 
Here we describe tree-structured documents transferred as SGML or XML. HTML, 
an SGML application, is the most commonly used document format on the Web. 
HTML represents documents as a tree using a fixed set of elements (tags), where 
the SGML DTD (Document Type Declaration) puts constraints on how elements 
can be nested. Each node in the hierarchy has a name (the element-name) , a set of 
name-value pairs known as its attributes and content, a sequence of sub-elements 
and text (data). 

XML is a rationalisation of SGML using the same tree-model, but removing 
many rarely used features as well as abbreviations that were introduced in SGML 
to make the markup easier to type and read by humans. XML documents are used 
to represent text using custom application-oriented tags as well as a serialization 
format for arbitrary data exchange between computers. XHTML is HTML based 
on XML rather than SGML. 

The first SGML parser for SWI-Prolog was created by Anjo Anjewierden based 
on the SP parser^. A stable Prolog term-representation for SGML/XML trees plays 
a similar role as the DOM (Document Object Model) representation in use in the 
object-oriented world. The term-structure we use is described in figure 1. Some 
issues have been subject to debate. 

• Representation of text by a Prolog atom is biased by the use of SWLProlog 
which has no length-limit on atoms and atoms that can represent Unicode text 
as motivated in section 6.2. At the same time SWI-Prolog stacks are limited to 
128MB each. Using atoms only the structure of the tree is represented on the 



^ http://www.jclark.com/sp/ 
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stack, while the bulk of the data is stored on the unlimited heap. Using lists 
of character codes is another possibility adopted by both PiLLoW (Gras and 
Hermenegildo 2001) and ECLiPSe (Leth et al. 1996). Two observations make 
lists less attractive: lists use two cells per character while practical experience 
shows text is frequently processed as a unit only. For (HTML) text-documents 
we profit from the compact representation of atoms. For XML documents 
representing serialized data-structures we profit from frequent repetition of 
the same value. 

• Attribute values of multi-value attributes (e.g. NAMES) are returned as a Prolog 
list. This implies the DTD must be available to get unambiguous results. With 
SGML this is always true, but not with XML. 

• Optionally attribute values of type NUMBER or NUMBERS are mapped to Prolog 
numbers. In addition to the DTD issues mentioned above, this conversion also 
suffers from possible loss of information. Leading zeros and different floating 
point number notations used are lost after conversion. Prolog systems with 
bounded arithmetic may also not be able to represent all values. Still, au- 
tomatic conversion is useful in many applications, especially those involving 
serialized data-structures. 

• Attribute values are represented as Name= Value. Using A^ame(Value) is an 
alternative. The Name= Value representation was chosen for its similarity to 
the SGML notation and because it avoids the need for univ (=. .) for process- 
ing argument-lists. 

Implementation The SWI-Prolog SGML/XML parser is implemented as a G-library 
that has been built from scratch to create a lightweight parser. Total source 
is 11,835 lines. The parser provides two interfaces. Most natural to Prolog is 
load_structure( -/-5rc, -DOM, +Options) which parses a Prolog stream into a term 
as described above. Alternatively. sgml_parse/2 provides an event-based parser 
making call-backs on Prolog for the SGML events. The call-back mode can deal with 
unbounded documents in streaming mode. It can be mixed with the term-creation 
mode, where the handler for begin calls the parser to create a term-representation 
for the content of the element. This feature is used to process long files with a repet- 
itive record structure in limited memory. Section 4.1 describes how this is used to 
process RDF documents. 

Full documentation is available from http://www.swi-prolog.org/packages/ 
sginl2pl.html The SWI-Prolog SGML parser has been adopted by XSB Prolog. 

3 Generating Web documents 

There are many approaches to generating Web pages from programs in general and 
Prolog in particular. We believe the preferred choice depends on various aspects. 

• How much of the document is generated from dynamic data and how much is 

static? Pages that are static except for a few strings are best generated from 
a template using variable substitution. Pages consisting of a table generated 
from dynamic data are best entirely generated from the program. 
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• For program generated pages we can choose between direct printing and gener- 
ating using a language-native syntax, for example format ('<b>bold</b>') 
or print_html(b(bold)). The second approach can guarantee well- formed 
output, but the first requires the programmer to learn about format /3 only. 

• Documents that contain a significant static paxt are best represented in the 
markup language where special constructs insert program-generated parts. 
A popular approach implemented by PHP-^ and ASP^ is to add a reserved 
element such as {script) and use the SGML/XML programming instruction 
written as <? . . . ?>. The obvious name PSP (Prolog Server Pages) is in use by 
various projects taking this approach.^ Another approach is PWP^ (Prolog 
Well- formed Pages) . It is based on the principle that the source is well-formed 
XML and interacts with Prolog through additional attributes. Output is guar- 
anteed to be well- formed XML. Our infrastructure does not yet include any 
of these approaches. 

• Page transforrn,ation is realised by parsing the original document into its 
tree representation, managing the tree and writing a new document from the 
tree. Managing the source-text directly is not reliable as due to character 
encoding choice, entity usage and SGML abbreviations there are many differ- 
ent source-texts that represent the same tree. The load_structure/3 predi- 
cate described in section 2 together with output primitives from the library 
sgml_write .pi provide this functionality. The XDIG case study described in 
section 8 follows this approach. 

3.1 Generating documents using DCG 

The traditional method for creating Web documents is using print routines such 
as write/1 or format/2. Although simple and easily explained to novices, the 
approach has serious drawbacks from a software engineering point of view. In par- 
ticular the user is responsible for HTML quoting, character encoding issues and 
proper nesting of HTML elements. Automated validation is virtually impossible 
using this approach. 

Alternatively we can produce a DOM term as described in section 2 and use the 
library sginl_write.pl to create the HTML or XML document. Such documents 
are guaranteed to use proper nesting of elements, escape sequences and character 
encoding. The terms however are big, deeply nested and hard to read and write. 
Prolog allows them to be built from skeletons containing variables. This approach is 
taken by PiLLoW (section 3.2) to control the complexity. In our opinion, the result 
is not optimal due to the unnatural order of statements as illustrated in figure 2. 
PiLLoW has partly overcome this shortcoming by defining a large number of 'utility 

^ www.php.net 

* www.microsoft.com 

® http : //www . prologonlineref erence . org/psp . psp, 
http : //www . benj aminj olmston . com . au/template . prolog?t=psp, 
http : //www . if computer . com/ inap/inap2001/program/ inap_bartenstein.ps 

® http : //www . cs . otago . ac . nz/staf f pr iv/ ok/pwp . pi 
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mkthumbnail (URL, Caption, ThumbNail) , 
output .html ([ hi ("Photo gallery"), 
ThumbNail 
]). 

mkthumbnail (URL, Caption, Term) :- 

Term = table([ tr(td( [halign=center] , img( [src=URL] ,[]))) , 
tr (td( [halign=center] , Caption) ) 

]) 

Fig. 2. Building PiLLoW terms 



terms' that are translated in a special way, as discussed in section 6.2 of (Gras and 
Hermenegildo 2001). 

Wc introduced a DCG rule html/ /I J This rule translates proper trees into a list 
of high-level HTML/XML commands that are handed to htmLprint/l to realise 
proper quoting, character encoding and layout. The intermediate format is of no 
concern to the user and similar in structure to the PiLLoW representation without 
using environments. Generated from the tree representation however, consistent 
opening and closing of elements is guaranteed. In addition to variable substitution 
which is provided by Prolog we allow calling rules. Rules are invoked by a term 
\Rule embedded in the argument of html//l. Figure 3 illustrates our approach. 
Note that any reusable part of the page generation can easily be translated into 
a DCG rule and the difference between direct translation of terms to HTML and 
rule-invocation is (^minc^nt. 

In our current implementation rules are called using meta-calling from html//l. 
Using term_exp£insion/2 it is straightforward to move the rule invocation out 
of the term, using variable substitution similar to PiLLoW. It is also possible to 
recursively expand the generated tree and validate it to the HTML DTD at compile- 
time and even insert omitted tags at compile-time to generate valid XHMTL from 
an incomplete specification. An overview of the argument to html//l is given in 
figure 4. 

3.2 Comparison with PiLLoW 

The PiLLoW library (Gras and Hermenegildo 2001) is a well established framework 
for Web programming based on Prolog. PiLLoW defines html2terms/2, convert- 
ing between an HTML string and a document represented as a Herbrand term. 
There are fundamental differences between PiLLoW and the primitives described 
here. 

The notation {name) / / {arity) refers to the grammar rule {name) with the given {arity), and 
consequently the predicate {name) with arity {arity)+2. 
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af f iliation_table :- 

findall(Name-Aff , af f iliation(Name , Aff ) , PairsO) , 
keysort (PairsO , Pairs), 

reply.page (table ( [border(2) ,align(center)] , 

[ tr([th('Naiiie'), th( 'Af filiation' )] ) 

I Xaffiliations (Pairs) 

])). 

affiliations([]) — > []. 

affiliations ([HIT]) — > af f iliation(H) , af filiations (T) . 

aff iliation(Name-Aff ) — > html(tr(td(Name) , td(Aff))). 

database 
aff iliation(wielemaker , uva) . 
af filiation (huang, vu) . 
aif f iliation( ' van der mei j ' , vu) . 

Page template 
reply_page (Term) :- 

format ( 'Content-type : text/html~n~n' ) , 
phrase (html (Term) , Tokens), 
print _html (Tokens) . 

Fig. 3. Library html_write.pl in action 



{html) 
(content) 



(attribute) 

(tag), (entity) 

(value) 

(rule) 



list-of (content) | (content) 

(atom) 

(tag) (list-of (attribute), (html)) 

(tag){(html)) 

\(rule) 

(name) {(value)) 

(atom) 

(atom) I (number) 
(callable) 



Fig. 4. The html//l argument specification 



• PiLLoW creates an HTML document from a Herbrand term that is passed 
to html2terms/2. Complex terms are composed of partial terms passed as 
Prolog variables. Frequent HTML constructs are supported using reserved 
terms using dedicated processing. We use DCGs and the \Rule construct, 
which makes it eminent which terms directly refer to HTML elements and 
which function as a 'macro'. In addition, the user can define application- 
specific reusable fragments in a uniform way. 

• The PiLLoW parser does not create the SGML document tree. It does not 
insert omitted tags, default attributes, etcetera. As a result, HTML docu- 
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[envCtable, [], [tr$[], t<i$[], "Hello"])] 

[element (table , [] , 

[ element (tbody, [] , 

[ element (tr, [] , 

[ element (td, [rowspan= ' 1 ' , colspan=' 1 '] , 
['Hello'])])])])] 

Fig. 5. Term representations for <table><tr><td>Hello</table> in PiLLoW (top) and 
our parser (bottom). Our parser completes the tr and td environments, inserts the omitted 
tbody element and inserts the defaults for the rowspan and colspan attributes 




Fig. 6. Sample RDF graph. Ellipses are vertices representing URIs. Quoted text is a 
literal. Edges are labelled with URIs. 

ments that differ only in omitted tags and wfiether or not default attributes 
are included in the source produce different terms. In our approach the term 
representation is equivalent, regardless of the input document. This is illus- 
trated in figure 5. Having a canonical DOM representation greatly simplifies 
processing parsed HTML documents. 



4 RDF documents 

Where the datamodel of both HTML and XML is a tree-structure with at- 
tributes, the datamodel of the Semantic Web (SW) RDF^ language consists of 
{Subject, Predicate, Object} triples. Both Subject and Predicate are URIs.^ Object 
is either a URI or a Literal. As the Object of one triple can be the Subject of an- 
other, a set of triples forms a graph, where each edge is labelled with a URI (the 
Predicate) and each vertex is either a URI or a literal. Literals have no out-going 
edges. Figure 6 illustrates this. 

A number of languages are layered on top of the RDF triple model. RDFS pro- 
vides a frame-based representation. The OWL-dialects^° provide three increasingly 

® http://www.w3.org/RDF/ 

® URI: Uniform Resource Identifier is like a URL, but need not refer to an existing resource on 

the Web. 
1° http://www.w3.org/2004/0WL/ 
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(subject), (predicate) ::= (URI) 
(object) ::= (URI) 



(URI), (text) 
(langid) 



(lit-value) 



literal ( ( lit.value) ) 
(text) 

\a,ng{(langid) , (teoct)) 
typc{(URI), (text)) 
(atom) 

(atom) (IS0639) 



Fig. 7. RDF types in Prolog. 



complex Web ontology languages. SWRL^^ is a proposal for a rule language. The 
W3C standard for exchanging these triple models is an XML application known as 
RDF/XML. 

As there are multiple XML tree representations for the same triple-set, RDF 
documents cannot be processed at the level of the XML-DOM as described in 
section 2. A triple- or graph-based structure is the most natural choice for repre- 
scntating an RDF document in Prolog. First wc must decide on the representation 
of URIs and literals. As a URI is a string and the only operation defined on URIs 
by SW languages is equivalence test, using a Prolog atom is an obvious choice. One 
may consider using a term (namespace) : {localname) , but given that decomposing a 
URI into its namespace and localname is only relevant during I/O we consider this 
an inferior choice. The RDF library comes with a compile-time rewrite mechanism 
based on goal_exp£insion/2 that allows for writing resources in Prolog sourcetext 
as (ns): (local). Literals are expressed as literal( Vafoe). The full type description is 
in figure 7. 

The typical SW use-scenario is to 'harvest' triples from multiple sources and 
collect them in a database before reasoning with them. Prolog can represent data 
as a Herbrand term on the stack or as predicates in the database. Given the rela- 
tively static nature of the RDF data as well as desired access from multiple threads, 
using the Prolog database is the most obvious choice. Here we have two options. 
One is the predicate rd{(Subject, Predicate, Object) using the argument types de- 
scribed above. The alternative is to map each RDF predicate on a Prolog predicate 
Predicate{Subject, Object). We have chosen for rdf/3 because it supports queries 
with uninstantiated predicates better and a single predicate is easier to manage 
than an unbounded set of predicates with unknown names. 



The RDF /XML parser is realised as a Prolog library on top of the XML parser de- 
scribed in section 2. Similar to the XML parser it has two interfaces. The predicate 
load _rdf (+5rc, -Triples, -hOptions) parses a document and returns a Prolog list of 

http : //www . w3 . org/Submission/SWRL/ 
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loacl_triples(File, Options) :- 

process_rdf (File , assert_triples , Options). 

assert_triples( [] , _) . 
assert_triples( [rdf (S,P,0) IT] , Src) :- 

rdf _assert(S, P, 0, Src), 
assert_triples(T, Src). 

Fig. 8. Loading triples using process_rdf/3 



<Painting rdf : about=" . . . "> 

<dimension> 

<Dimension width="45" height="50"/> 

</dimension> 
</Painting> 

Fig. 9. Blank node to express the compound dimension property 



rdf(6',-P, O) triples. Note that despite harvesting to the database is the typical use- 
case scenario, the parser delivers a list of triples for maximal flexibility. The pred- 
icate process _rdf(T'-S'rc, :Action. +Options) exploits the mixed call-back/convert 
mode of the XML parser to process the RDF file one description (record) at a 
time, calling Action with a list of triples extracted from the description. Figure 8 
illustrates how this is used by the storage module to load unbounded files with 
limited stack usage. Source location as {file) -.{line) is passed to the Src argument of 
assert triples / 2. 

In addition to named URIs, RDF resources can be blank-nodes. A blank-node 

(short bnode) is an anonymous resource that is created from an in-lined description. 
Figure 9 describes the dimensions of a painting as a compound instance of class 
Dimension with width and height properties. The Dimension instance has no URI. 
Our parser generates an identifier that starts with a double underscore, followed by 
the source and a number. The double underscore is used to identify bnodes. Source 
and number are needed to guarantee the bnode is unique. 

The parser from XML to RDF triples covers the full RDF specification, including 
Unicode handling, RDF datatypes and RDF language tags. The Prolog source is 
1,788 lines. It processes approximately 9,000 triples per second on an AMD 1600+ 
based computer. Implementation details and evaluation of the parser are described 
in (Wielemaker et al. 2003). 

We have two libraries for writing RDF/XML. One, 
rdf_writejcml (I'-^'iream, +Triples), provides the inverse of load_rdf/2, writing 



The parser described there did not yet support RDF datatypes and language tags, nor Unicode. 
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Index pattern 


Calls 






58 


+ - 




253,554 


- + 




62 


+ + 




23,292,353 




+ 


633,733 


- + 


+ 


7,807,846 


+ + 


+ 


26,969,003 



an XML document from a list of rdf(5',P,0) terms. The other, called rdf_save/2 
is part of the RDF storage module described in section 4.2 and writes a database 
directly to a file or stream. The first (rdf_writeJx;ml/2) is used to exchange 
computed graphs to external programs using network communication, while the 
second (rdf_save/2) is used to save modified graphs back to file. The resulting 
code duplication is unfortunate, but unavoidable. Creating a temporary graph in a 
database requires potentially much memory, and harms concurrency, while graphs 
fetched from the database into a list may not fit in the Prolog stacks and is also 
considerably slower than a direct write. 

4.2 Storage of RDF 

Assuming the 'harvesting' use-case, we need to implement a predicate 
rdf{?S,?P,?0). Indexing the database is crucial for good performance. Tabic 1 il- 
lustrates the calling pattern from a real-world application counting 4 million triples. 
Also note that our data is described by figure 7. The RDF store was developed in 
the context of projects which formulated the following requirements. 

• Upto at least 10 million triples on 32-bit hardware. 

• Fast graph traversal using any instantiation pattern. 

• Case-insensitive search on literals. 

• Prefix search on literals for completion in the User Interface. 

• Searching for words that appear in literals. 

• Multi-threaded access based on read/write locks. 

• Transaction management and persistent store. 

• Maintain source information, so we can update, save or remove data based 
on its source. 

• Fast load/save of current state. 

Our first version of the database used the Prolog database with secondary ta- 
bles to improve indexing. As requirements pushed us against the limits of what is 
achievable in a 32-bit address-space we decided to implement the low level store 
in C. Profiting from the known uniform structure of the data we realised about 
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two times more compact storage with better indexing than using a pure Prolog 
approach. We took the following design decisions for the C-based storage module: 

• The RDF predicates are represented as unique entities and organised accord- 
ing to the rdfs:subPropertyOf relation in multiple hierarchies. The root of 
each hierarchy is used to compute the hash for the triple. If there is no unique 
root due to a cycle an arbitrary predicate is assigned to be the root. 

• Literals are kept in an AVL tree, sorted case-insensitive and case-preserving 
(e.g. AaBb. . . ). Numeric literals preceed all non-numeric and are kept sorted 
on their numeric value. Storing literals in a separate sorted table allows for 
indexed search for prefixes and numeric values. It also allows for monitor- 
ing creation and destruction of literals to maintain derived tables such as 
stemming or double methaphone (Philips 2000) based on rdf_monitor/3 
described below. The space overhead of maintaining the table is roughly can- 
celled by avoiding duplicates. Experience on real data ranges between -5% 
and -hlO%. 

• Resources are represented by Prolog atom-handles. The hash is computed 

from the handle- value. Note that avoiding the translation between Prolog 
atom and text avoids both duplication of data and table-lookup. We consider 
this a crucial aspect. 

• Each triple is represented by the atom-handle for the subject, predicate- 
pointer, atom-handle or literal pointer for object, a pointer to the source, 
a line number, a general bit-flag field and 6 'hash-next' pointers covering all 
indexing patterns except for -|-,-|-,-|- and -|- Queries using the pattern 
-|-,-,-|- are rare. Fully instantiated queries internally use the pattern +,+,-, 
assuming few values on the same property. Considering experience with real 
data we will probably add a -|-,-|-,-|- index in the future. The un-indexed ta- 
ble is a simple linked list. The others are hash-tables that are automatically 
resized if they become too populated. 

The store itself does not allow for writes while there are active reads in progress. 
If another thread is reading, the write operation will stall until all threads have 
finished reading. If the thread itself has an open choicepoint a permission error 
exception is raised. To arrive at meaningful update semantics we introduced trans- 
actions. The thread starting a transaction obtains a write-lock, initially allowing 
readers to proceed. During the transaction all changes are recorded in a linked list 
of actions. If the transaction is ready for commit, the thread denies access to new 
readers and waits for all readers to vanish before updating the database. Trans- 
actions are realised by rdf_transaction(.-G'oaO. If Goal succeeds, its choicepoints 
are discarded and the transaction is committed. If Goal fails or raises an exception 
the transaction is discarded and rdf_transaction/l returns failure or exception. 
Transactions can be nested. Nesting a transaction places a transaction-mark in the 
list of actions of the current transaction. Committing implies removing this mark 
from the list. Discarding removes all action cells following the mark as well as the 
mark itself. 
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"/« triples 
mary type woman . 
woman type Class . 
woman subClassOf human 
hiunan type Class . 



°/t entailment interface 7. RDFS interface 

?- rdf (mary, type, X). ?- rdf s_individual_of (mary, X) . 

X = woman ; X = woman ; 

X = human ; X = human ; 

No No 



Fig, 



10. Different interface styles for RDFS 



It is possible to monitor the database using rdf_monitor( ;(?oaZ, +Events). 
Whenever one of the monitored events happens Goal is called. Modifying actions 

inside a transaction arc called during the conimit. Modifications by the monitors 
are collected in a new transaction which is committed immediately after complet- 
ing the preceeding commit. Monitor events are assert, retract, update, new .literal, 
oldJiteral, transaction begin/end and file- load. Goal is called in the modifying 
thread. As this thread is holding the database write lock, all invocations of monitor 
calls are fully serialized. 

Although the 9,000 triples per second of the RDF/XML parser ranks it among 
the fast parsers, loading 10 million triples takes nearly 20 minutes. For this reason 
we developed a binary format. The format is described in (Wielemaker et al. 2003) 
and loads approximately 10 times faster than RDF/XML, while using about the 
same space. The format is independent from byte-order and word- length, supporting 
both 32- and 64-bit hardware. 

Persistency is achieved through the library rdf_persistency.pl, which uses 
rdf_monitor/3 to maintain a set of files in a directory. Each source known to 
the database is represented by two files, one file representing the initial state us- 
ing the quick-load binary format and one file containing Prolog terms representing 
changes, called the journal. 



We have identified two approaches for reasoning on top of the plain RDF predicate 
for more high-level languages such as RDFS or OWL. One approach is taken by the 
SeRQL query system described in section 7. It is based on the observation that these 
languages provide rules to deduce new triples from the set of known triples. The 
API for high level languages is now simply the rdf/3 predicate, where rdf(5,P,0) 
is true for any triple in the deductive closure of the original triple set under the 
given language. The deductive closure can be realised using full forward reasoning, 
deducing new triples until this is no longer possible or by a combination of backward 
reasoning and forward reasoning. An alternative approach is to consider RDFS or 
OWL at the conceptual level and introduce a set of predicates that are inspired 
on this level. This approach is taken by our library rdfs.pl. defining predicates 
such as rdfs_individual_of( ?i?esoMrce, ?Glass), rdfs_subclass_of( F^wfe, fSuper). 
Figure 10 illustrates the difference in these approaches. 



4-3 Reasoning with RDF documents 
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4-4 Experience 

The RDF infrastructure is used in two of the three case-studies described at the 
end of this paper. RDF has a natural representation in Prolog, either as a list of 
terms or as a pure predicate. Prolog non-determinism greatly simplifies querying 

the database. Transitivity is easily expressed using recursion. However, as cycles in 
SW graphs are allowed and frequent, such algorithms must be protected against 
them. Cycle-detection complicates the code and harms performance. We plan to 
investigate tabling (Ramakrishnan ct al. 1995) to improve on this situation. 

Although designed as RDF store for SW-based projects, the infrastructure is 
also commonly used to create RDF documents from other sources as well as for 
filtering and reorganizing RDF documents. In the e-culture project it has been 
used to convert WordNet (Miller 1995) from its Prolog representation and the Getty 
thesauri" from XML (4GB data) into RDF. 

5 Supporting HTTP 

HTTP, or HyperText Transfer Protocol, is the key W3C standard protocol for 
exchanging Web documents. All browsers and Web servers implement it. The initial 
version of the protocol was very simple. The client request consists of a single line 
of the format (action) (path), the server replies with the requested document and 
closes the connection. Version 1.1 of the protocol is more complicated, providing 
additional name- value pairs in the request as well as the reply, features to request 
status such as modification time, transfer partial documents, etcetera. 

Adding HTTP support in Prolog, we must consider both the client- and server- 
side. In both cases our choice is between doing it in Prolog or re-using an existing 
application or library by providing an interface for it. We compare our work with 
PiLLoW (Cabeza and Hermenegildo 2003) and the ECLiPSe HTTP services (Leth 
et al. 1996). 

Given a basic TCP /IP socket library, writing an HTTP client is trivial (our client 
is 258 lines of code). Both PiLLoW and ECLiPSe include a client written in Prolog. 
More issues complicate the choice for a pure Prolog based server. 

• The server is more complex, which implies there is more to gain by re-using 
external code. Our core server library counts 1,784 lines. 

• A single computer can only host one server at port 80 used by default for 
public HTTP. Using an alternate port for middleware and storage tier com- 
ponents is no problem, but use as a public server often conflicts with firewall 
or proxy settings. This can be solved using a proxy server such as the Apache 
mod-proxy^^ configured as reverse proxy. 

• Servers by definition introduce security risks. Administrators are reluctant to 
see non-proven software in the role of a public server. Using a proxy as above 
also reduces this risk, especially if the proxy blocks malformed requests. 

13 e-culture .multimedian.nl 

1* http : //www . getty . edu/research/conduct ing_research/vocabularies/ 
1^ http : //httpd . apache . org/ docs/1 . 3/mod/mod_proxy . html 
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http_cllent.pl 




http_wrapper.pl 
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lnetd_http.pl 
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http_header.pl 



thread_http.pl 



Fig. 11. Module dependencies of the HTTP library 



Despite those observations, we consider, like the ECLiPSe team, a pure Prolog 
based server worthwhile. As argued in section 6.1, many Prolog Web applications 
profit from using state stored in the server. Large resources such as WordNet (Miller 
1995) cause long startup times. In such cases the use of CGI (Common Gateway 
Interface) is not appropriate as a new copy of the application is started for each 
request. PiLLoW resolves this issue by using Active Modules, where a small CGI 
application talks to a continuously running Prolog server using a private protocol. 
Using a Prolog HTTP server and optionally a reverse proxy has the same benefits, 
but based on a standard protocol, it is much more flexible. 

Another approach is embedding Prolog in another server framework such as the 
Java based Tomcat server. Although feasible, embedding non-Java based Prolog 
systems in Java is complicated. Embedding through jni introduces platform and 
Java version dependent problems. Connecting Prolog and Java concurrency models 
and garbage collection is difficult and the resulting system is much harder to manage 
by the user than a pure Prolog based application. 

In the following sections we describe our HTTP client and server libraries. An 
overall overview of the modules and their dependencies is given in figure 11. 



5.1 HTTP client libraries 

We support two clients. The first is a hghtweight cHent that only supports the 
HTTP GET method by means of http_open(-/-?7i?X, -Stream, + Options). Options 
allows for setting a timeout or proxy as well as getting information from the reply- 
header such as the size of the document. The http_open/3 predicate internally 
handles HTTP 3xx (redirect) replies. Other non-ok replies are mapped to a Prolog 
exception. After reading the document the user must close the returned stream- 
handle using the standard Prolog close/1 predicate. This predicate makes accessing 
an HTTP resource as simple as accessing a local file. The second library, called 
http_client.pl, provides support for HTTP POST and a plugin interface that 
allows for installing handlers for documents of specified MIME-types. It shares 
httpjieader .pi with the server libraries for DCG based creation and parsing of 
HTTP headers. Currently provided plugins include http_mime_plugin.pl to handle 
multipart MIME messages and http_sgml_plugin.pl for automatically parsing 
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?- use_module (library ( 'http/http_client ') ) . 

?- use_module (library ( 'http/http_sgml_plugin' ) ) . 

?- littp_get ( ' http : //www . swi-prolog . org/ ' , DOM , [] ) . 

DOM = [element (html , [version='-//W3C//DTD HTML 4.0 Transitional//EN'] , 
[element (head, [] , 

[element (title, [] , 

['SWI-Prolog\'s Home']), ... 

Fig. 12. Fetching an HTML document 



HTML, XML and SGML documents. Figure 12 shows the code for fetching a URL 
and parsing the returned HTML document it into a Prolog term as described in 
section 2. 

Both the PiLLoW and ECLiPSe approach return the documents content as a 
string. Our interface is stream-based (http_open/3) or allows for plugin-based 
processing of the stream (http_get/3, http_post/4). This interface avoids po- 
tentially large intermediate data-structures and allows for processing unbounded 
documents. 

5.2 The HTTP server library 

Both to simplify re-use of application code and to make it possible to use the 
server without committing to a large infrastructure we adopted the reply-strategy 
of the CGI protocol, where the handler writes a page consisting of an HTTP header 
followed by the document content. Figure 13 provides a simple example that returns 
the request-data to the client. By importing thread_http.pl we implicitly selected 
the multi-threaded server model. Other models provided are inetdJittp, causing 
the (Unix) inct daemon to start a server for each request and xpce Jittp which uses 
I/O multiplexing realising multiple clients without using Prolog threads. The logic 
of handling a single HTTP request given a predicate realising the handler, an input 
and output stream is implemented by http_wrapper. 

Replies other than "200 OK" are generated using a Prolog exception. Recognised 
replies are defined by the predicate http-reply {+Reply, +Stream, +Header). For 
example to indicate that the user has no access to a page we must use the following 
call. 

throw (http_reply (forbidden (URL) ) ) . 

Failure of the handler raises a "404 existence error" reply, while exceptions other 
than the ones described above raise a "500 Server error" reply. 

5.2.1 Form parameters 

The library http_parameters . pi defines http_parameters( +Request, ? Parameters) 
to fetch and type-check parameters transparently for both GET and POST re- 
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:- use_inodule( 

library('http/thread_httpd')) . 

start_server (Port) :- 

http_server (reply , [port (Port ) ] ) . 

reply (Request) :- 

format ( 'Content -type : text/plain~n"'n' ) 
writeln(Request) . 



Fig. 13. A simple HTTP server. The right window shows the client and the format ol 

the parsed request. 



reply (Request) :- 

http_paraineters (Request , 

[ title (Title, [optional (true)] ) , 
name (Name, [length >= 2]), 
age (Age , [integer] ) 

]), ... 

Fig. 14. Fetching HTTP form data 

quests. Figure 14 illustrates the functionality. Parameter values are returned as 
atoms. If large documents are transferred using a POST request the user may wish 
to revert to http_read_data(+i?egMesf. -Data, + Options) underlying http_get/3 
to process arguments using plugins. 

5.2.2 Session management 

The library http_session.pl provides session over the stateless HTTP protocol. It 
does so by adding a cookie using a randomly generated code if no valid session id is 
found in the current request. The interface to the user consists of a predicate to set 
options (timeout, cookie-name and path) and a set of wrappers around assert /I 
and retract/1, the most important of which are http_session_assert( +Z)ato), 
http_session_retract( ?Z)ato) and http_session_data( ?Dato). In the current ver- 
sion the data associated with sessions that have timed out is simply discarded. 
Session-data does not survive the server. 

Note that a session generally consists of a number of HTTP requests and replies. 
Each request is scheduled over the available worker threads and requests belonging 
to the same session are therefore normally not handled by the same thread. This 
implies no session state can be stored in global variables or in the control-structure 
of a thread. If such style of programming is wanted the user must create a thread 
that represents the session and setup communication from the HTTP-worker thread 
to the session thread. Figure 15 illustrates the idea. 



9 MDzillB Firefox 

Q O 1$ | Bhttp://loc.lhoBt:^ ath^name = value"H ^ 



SSUSE LINUX S Entertainment Snswe ^Internet Search ^Reference Swaps and Diracdons 

[ peer(ip(127. 0. 0. 1)). 
nnputt 'Istresm' (459784)) . 
methodlget). 
5earch([narTie=value]). 
pathC/path'). 
http_versnonll-l). 
hostClocalhost) . 
port(4eEQ) , 

y5er_agent( ' Mozi lla/5 .0 (Xll; U; Linux n5B5; en-US; rv:1.7.12] Gecko/2005ei 
). 

acceptCtext/xml.appli cat! on/xml.applicatnon/Khtml+xiiil, text/html ;q=e. 9,- 
) . 

accept_language( 'en-u5.en;q=0.B') . 
accept_en[odnng('gznp.deflate'). 
act:ept_[harset('ISO-3359-l.utf-3;q=0.7.-;q=0.7'). 
keep_3live('3BQ'). 
connectnonC keep-alnve' J . 

cookie ( [twi ki si d = f 6c5e2 5fe0303c2 52 40bl9f 40b3ef 9b4] ) . 
caEtie_cofitrol ( ' nax- age=0 ' ) 
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reply (Request) :- '/, HTTP worker 

( http_session_data (thread (Thread) ) 
-> true 

; thread_create(session_loop( [] ) , Thread, [detached(true)] ) , 
http_session_assert (thread (Thread) ) 

), 

current _output (CGIOut) , 
thread_self (Me) , 

thread_send_message (Thread, handle (Request , Me, CGIOut)), 
thread_get_message(_Done) . 

session_loop (State) :- '/, Session thread 

thread_get_message (handle (Request, Sender, CGIOut)), 

next_state (Request , State, NewState, CGIOut). 
thread_send_message (Sender, done) . 

Fig. 15. Managing a session in a thread. The reply/1 predicate is part of the HTTP 
worker pool, while session_loop/l is executed in the thread handling the session. We 
omitted error handling for readability of the example. 



Table 2. HTTP performance executing a trivial query 10,000 times. Times are in 
seconds. Localhost, dual AMD 1600+ running SuSE Linux 10.0 



Connection 


Elapsed 


Server CPU 


Client CPU 


Close 


20.84 


11.70 


7.48 


Keep- Alive 


16.23 


8.69 


6.73 



5.2.3 Evaluation 

The presented server infrastructure is currently used by many internal and external 
projects. Coding a server is very similar to writing CGI handlers and running in the 
interactive Prolog process is much easier to debug. As Prolog is capable of reloading 
source files in the running system, handlers can be updated while the server is 
running. Handlers running during the update are likely to die on an exception 
though. We plan to resolve this issue by introducing read/write locks. The protocol 
overhead of the multi-threaded server is illustrated in table 2. 



6 Enabling extensions to the Prolog language 

SWI-Prolog has been developed in the context of projects many of which caused the 

development to focus on managing Web documents and protocols. In the previous 
sections we have described our Web enabling libraries. In this section we describe 
extensions to the ISO-Prolog standard (Deransart et al. 1996) we consider crucial 
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for scalable and comfortable deployment of Prolog as an agent in a Web centred 
world. 

6.1 Multi-threading 

Concurrency is necessary for applications for the following reasons: 

• Network delays may cause communication of a single transaction to take very 
long. It is not acceptable if such incidents block access for other clients. This 
can be achieved using multiplexed I/O, multiple processes handling requests 
in a pool or multiple threads in one or more processes handling requests in a 
pool. 

• CPU intensive services must be able to deploy multiple CPUs. This can be 
achieved using multiple instances of the service and load-balancing or a single 
server running on multi-processor hardware or a combination of the two. 

As indicated, none of the requirements above require multi-threading support 
in Prolog. Nevertheless, we added multi-threading (Wielemaker 2003) because it 
resolves the problems mentioned above for medium-scale applications while greatly 
simplifying deployment and debugging in a platform independent way. A multi- 
threaded server also allows maintaining state for a specific session or even shared 
between multiple sessions simply in the Prolog database. The advantages of this 
are described in (Szeredi et al. 1996), using the or-parallel Aurora to serve multiple 
clients. This is particularly interesting for accessing the RDF database described in 
section 4.2. 

6.2 Atoms and Unicode support 

Unicode^^ is a character encoding system that assigns unique integers (code-points) 
to all characters of almost all scripts known in the world. In Unicode 4.0, the code- 
points range from 1 to OxlOFFFF. Unicode can handle documents from different 
scripts as well as documents that contain multiple scripts in a single uniform rep- 
resentation, an important feature in applications processing Web data. Traditional 
HTML applications commonly insert special symbols through entities such as the 
copyright (©) sign, Greek and mathematical symbols, etcetera. Using Unicode we 
can represent all entity values as plain text. As illustrated in the famous Semantic 
Web layer cake in figure 16, Unicode is at the heart of the Semantic Web. 

HTML documents can be represented using Prolog strings because Prolog inte- 
gers can represent all Unicode code-points. As we have claimed in section 2 however, 
using Prolog strings is not the most obvious choice. XML attribute names and val- 
ues can contain arbitrary Unicode characters, which requires the unnatural use of 
strings for these as well. If we consider RDF, URIs can have arbitrary Unicode 
characters^'' and we want to represent URIs as atoms to exploit compact storage as 

^® http://www.Unicode.org/ 

^'^ http : //www . w3 . org/TR/rdf -concepts/#section-Graph-URIref 
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Fig. 16. The Semantic Web layer cake by Tim Burners Lee 

well as fast equivalence testing. Without Unicode support in atoms we would have 
to encode Unicode in the atom using escape sequences. All this patchwork can be 
avoided if we demand the properties below for Prolog atoms. 

• Atoms represent text in Unicode 

• Atoms have no limit on their length 

• The Prolog implementation allows for a large number of atoms, both to rep- 
resent URIs and to represent text in HTML/XML documents. SWI-Prolog's 
limit is 2^5 (32 million). 

• Continuously running servers cannot allow memory leaks and therefore pro- 
cessing dynamic data using atoms requires atom garbage collection. 

7 Case study — A Semantic Web Query Leinguage 

In this case-study we describe the SWI-Prolog SeRQL implementation.^** SeRQL is 
an RDF query language developed as part of the Sesame project^^ (Broekstra et al. 
2002). SeRQL uses HTTP as its access protocol. Sesame consists of an implemen- 
tation of the server as a Java servlet and a Java client-library. By implementing 
a compatible framework we made our Prolog based RDF storage and reasoning 
engine available to Java clients. The Prolog SeRQL implementation uses all of the 
described SWI-Prolog infrastructure and building it has contributed significantly 
to the development of the infrastructure. Figure 17 lists the main components of 
the server. 

The entailment modules are plugins that implement the entailment approach to 
RDF reasoning described in section 4.3. They implement rdf/3 as a pure predi- 
cate, adding implicit triples to the raw triples loaded from RDF/XML documents. 
Figure 18 shows the somewhat simplified entailment module for RDF. The multifile 
rule registers the module as entailment module for the SeRQL system. New mod- 
ules can be loaded dynamically into the platform, providing support for other SW 
languages or application-specific server-side reasoning. Prolog's dynamic loading 
and re-loading allows for updating such reasoning modules on the live server. 

The SeRQL parser is a DCG-based parser translating a SeRQL query text into a 



http : //www . swi-prolog . org/packages/SeRQL 
http : //www . openrdf . org 
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Fig. 17. Module dependencies of the SeRQL system. Arrows denote 'imports from' 

relations. 



:- module (rdf .entailment, [rdf /3] ) . 

rdf(S, P, 0) :- 

rdf_db:rdf (S, P, 0) . 
rdf(S, rdf: type, rdf : 'Property' ) :- 

rdf_db:rdf (_, S, _), 

\+ rdf _db : rdf ( S , rdf : type , rdf :' Property ') . 
rdf(S, rdf:type, rdf s : 'Resource' ) :- 
rdf _db: rdf .subject (S) , 

\+ rdf _db: rdf (S, rdf: type, rdf s : 'Resource' ) . 

:- multifile serql : entailment/2 . 
serql : entailment (rdf , rdf .entailment) . 

Fig. 18. RDF entailment module 



compound goal calling rdf/3 and predicates from the SeRQL runtime library which 

provide comparison and functions built into the SeRQL language. The resulting 
control-structure is passed to the query optimiser (Wielemaker 2005) which uses 
statistics maintained by the RDF database to reorder the pure rdf/3 calls for 
best performance. The optimiser uses a generate-and-evaluate approach to find 
the optimal order. Considering the frequently long conjunctions of rdf/3 calls, 
the conjunction is split into independent parts. Figure 19 illustrates this in a very 
simple example. During abstract execution, information on instantiation and types 
implied by the runtime library predicates is attached to the variables using dynamic 
attributed variables. (Demoen 2002). 

HTTP access consists of two parts. The human-centred portal consists of HTML 
pages with forms to administer the server as well as view statistics, load and unload 
documents and run SeRQL queries interactively presenting the result as an HTML 
table. Dynamic pages are generated using the htinl_write.pl library described in 
section 3.1. Static pages are served from HTML files by the Prolog server. Ma- 
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rdf (Paper, author, Author), 

rdf (Author, name. Name), 

rdf (Author, affiliation, Aff il) , 



Fig. 19. Split rdf conjunctions. After executing the first rdf/3 query Author is bound and 
the two subsequent queries become independent. This is also true for other orderings, so 
we only need to evaluate 3 alternatives instead of 3! (6). 



chines use HTTP POST requests to provide query data and get a reply in XML or 
RDF/XML. 

The system knows about various RDF input and output formats. To reach mod- 
ularity the kernel exchanges RDF graphs as lists of terms rdf(S',P, O) and result- 
tables as lists of terms using the functor row and arity equal to the number of 
columns in the table. The system calls a multifile predicate using the format iden- 
tifier and data to realise the requested format. The HTML output format uses 
html_write.pl. The RDF/XML format uses rdf_writejx:ml/2 described in sec- 
tion 4.1. Both rdf_writejcml/2 and the other XML output format use straight 
calls format/3 to write the document, where quoting values is realised by quoting 
primitives provided by the SGML/XML parser described in section 2. Using direct 
writing instead of techniques described in section 3.1 avoids potentially large inter- 
mediate datastructures and is not very complicated given the very simple structure 
of the documents. 



7. 1 Evaluation 

The SeRQL server and the SWI-Prolog library development is too closely integrated 
to use it as an evaluation of the functionality provided by the Web enabling libraries. 
We compared our server to Sesame, written in Java. The source code of the Prolog 
based server is 6,700 lines, compared to 86,000 for Sesame. As both systems have 
very different coverage in functionality and can re-use libraries at different levels 
it is hard to judge these figures. Both answer trivial queries in approximately 5ms 
on a dual AMD 1600-1- PC running Linux 2.6. On complex queries the two systems 
perform very differently. Sesame's forward reasoning makes it handle some RDFS 
queries much faster. Sesame does not contain a query optimizer which cause order- 
dependent and sometimes very long response times on large conjunctions. 

The power of LP where programs can be handled as data is exploited by parsing 
the SeRQL query into a program, optimizing the program by manipulating it as 
data, after which we can simply call it to answer the query. The non-deterministic 
nature of rdf/3 allows for a trivial translation of the query to a non-deterministic 
program that produces the answers on backtracking. 

The server only depends on the standard SWI-Prolog distribution and there- 
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fore runs unmodified on all systems supporting SWI-Prolog. It has been tested on 
Windows, Linux and MacOS X. 

All infrastructure described is used in the server. We use format/3, exploit- 
ing XML quoting primitives provided by the Prolog XML library to print highly 
repetitive XML files such as the SeRQL result-table. Alternatively we could have 
created the corresponding DOM term and call xml_write/2 from the library 
sginl_write.pl. 

8 Case study — XDIG 

In section 7 wc have discussed the case study how SWI-Prolog is used for a RDF 
query system, i.e., a meta-data management and reasoning system. In this section 
we describe a Prolog-powered system for ontology management and reasoning based 
on Description Logics (DL). DL has greatly influenced the design of the W3C ontol- 
ogy language OWL. The DL community, called DIG (DL Implementation Group) 
have developed a standard for accessing DL reasoning engines called the DIG de- 
scription logic intcrfacc^*^ (Bechhofcr et al. 2003), DIG interface for short. Many 
DL reasoners like Racer (Haarslev and MoUer 2001) and FACT (Horrocks 1999) 
support the DIG interface, allowing for the construction of highly portable and 
reusable DL components or extensions. 

In this case study, we describe XDIG, an cXtended DIG Description Logic in- 
terface, which has been implemented on top of the SWI-Prolog Web libraries. The 
DIG interface uses an XML-based messaging protocol on top of HTTP. Clients of 
a DL reasoner communicate by means of HTTP POST requests. The body of the 
request is an XML encoded message which corresponds to the DL concept language. 
Where OWL is based on the triple model described in section 4, DIG statements 
are grounded directly in XML. Figure 20 shows a DIG statement which defines the 
concept MadCow as a cow which eats brains, part of sheep. 

8.1 Architecture of XDIG 

The XDIG libraries form a framework to build DL reasoners that have additional 
reasoning capabilities. XDIG serves as a regular DL reasoner via its corresponding 
DIG interface. An intermediate XDIG server can make systems independent from 
application specific characteristics. A highly decoupled infrastructure significantly 
improves the reusability and applicability of software components. 

The general architecture of XDIG is shown in figure 21. It consists of the following 
components: 

XDIG Server The XDIG server deals with requests from ontology applications. It 
supports our extended DIG interface, i.e., it not only supports standard DIG/DL 
requests, Uke 'tell' and 'ask', but also additional processing features like chang- 
ing system settings. The library dig_server.pl implements the XDIG proto- 
col on top of the Prolog HTTP server described in section 5.2. The predicate 



http://dl.kr. org/dig/ 
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<equalc> 

<catom name='mad+cow'/> 
<and> 

<catom naine='cow'/> 
<some> 

<ratom name='eats'/> 
<and> 

<catom rLame=' brain '/> 
<some> 

<ratom name='part+of '/> 

<catom name= ' sheep ' /> 
</ someX/ andx/ someX/ andx/ equalc> 

Fig. 20. a DIG statement on MadCow 
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Fig. 21. Architecture of XDIG 



di^-sewer{+ Request) is called from the HTTP server to process a client's Re- 
quest as illustrated in figure 13. XDIG server developers have to define the pred- 
icate my_dig_server_processing(-/-£'a<a, -Answer, -hOptions), where Data is 
the parsed DIG XML requests and Answer is term answer {-Header, -Reply). 
Reply is the XML-DOM term representing the answer to the query. 

DIG Client XDIG is designed to rely on an external DL reasoner. It im- 
plements a regular DIG interface client and calls the external DL rea- 
soner to access the standard DL reasoning capabilities. The predicate 
dig_post(T'-L'ata, -Reply, -hOptions) posts the data to the external DIG server. 
The predicates are defined in terms of the predicate http_post/4 and others in 
the HTTP and XML libraries. 

Main Control Component The library dig_process.pl provides facilities to 
analyse DIG statements such as finding concepts, individuals and roles, but also 
decide of satisfiability of concepts and consistency. Some of this processing is 
done by analysing the XML-DOM representation of DIG statements in the local 
repository, while satisfiability and consistency checking is achieved by accessing 
external DL reasoners through the DIG client module. 

Ontology Repository The Ontology Repository serves as an internal knowledge 
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direct_concept_relevant (element (catom, Atts, _) , Concept) :- 

memberchk(name=Concept , Atts). 
direct_concept_relevant (element (_ , _, Content), Concept) :- 

direct_concept_relevant (Content , Concept) . 
direct_concept_relevaiit ([HIT], Concept) : - 

( direct_concept_relevant(H, Concept) 

; direct_concept_relevant(T, Concept) 

). 

Fig. 22. direct-concept-relevant checks that a concept is referenced by a DIG statement 



base (KB), which is used to store multiple ontologies locally. These ontology 
statements are used for further processing when the reasoner receives an 'ask' 
request. The main control component usually selects parts from the ontologies 
to post them to an external DL reasoner and obtain the corresponding answers. 

This internal KB is also used to store system settings. 

As DIG statements are XML based, XDIG stores statements in the local reposi- 
tory using the XML-DOM representation described in section 2. The tree model of 
XDIG data has been proved to be convenient for DIG data management. 

Figure 22 shows a piece of code from the library XDIG defining the predicate 
direct_concept_relevant(-f£'OM, fConcept) which checks if a set of Statements 
is directly relevant to a Concept, namely the Concept appears in the body of a 
statement in the list. The predicate direct .concept .relevant /2 has been used to 
develop PION for reasoning with inconsistent ontologies, and DION for inconsistent 
ontology debugging. 

8.2 Application 

XDIG has been used to develop several DL reasoning services. PION is a rea- 
soning system that deals with inconsistent ontologies^""^ (Huang and Visser 2004; 
Huang et al. 2005). MORE is a multi- version ontology reasoner^^ (Huang and Stuck- 
enschmidt 2005). DION is a debugger of inconsistent ontologies^"^ (Schlobach and 
Huang 2005). With the support of an external DL reasoner like Racer, DION can 
serve as an inconsistent ontology debugger using a bottom-up approach. 

9 Case study — Faceted browser on Semeintic Web database 
integrating multiple collections 

In this case study we describe a pilot for the STITCH-project^^ whose main aim is 

http://wasp.cs.vu.nl/sekt/pion 
http://wasp.cs.vu.nl/sekt/more 
http://wasp.cs.vu.nl/sekt/dion 
http://stitch.cs.vu.nl 
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studying and finding solutions for the problem of integrating controlled vocabularies 
such as thesauri and classification systems in the Cultural Heritage domain. The 
pilot consists of the integration of two collections - the Medieval Illuminations of the 
Dutch National Library (Koninklijkc Bibliothcck) and the Masterpieces collection 
from the Rijksmuseum - and development of a user interface for browsing the 
merged collections. One requirement within the pilot is to use "standard Semantic 
Web techniques" during all stages, so as to be able to evaluate their added value. 
An explicit research goal was to evaluate existing "ontology mapping" tools. 
The problem could be split into three main tasks: 

• Gathering data, i.e. records of the collections and controlled vocabularies they 
use, and transforming it into RDF. 

• Establishing semantic links between the vocabularies using off-the-shelf on- 
tology mapping tools. 

• Building a prototype User Interface (UI) to access (search and browse) the 
integrated collections and experiment with different ways to access them using 
a Web server. 

SWI-Prolog has been used in all three tasks; to illustrate the use of the SWI-Prolog 
Web libraries in the pilot, in the next section we focus on their application in the 
prototype UI because it is the largest subsystem using these libraries. 

9.1 Multi-Faceted Browser 

Multi-Faceted Browsing is a search and browse paradigm where a collection is 

accessed by refining multiple (preferably) structured aspects ~ called facets - of 
its elements. For the user interface and user interaction we have been influenced 
by the approach of Flamenco (Hearst et al. 2002). The Multi- Faceted Browser is 
implemented in SWI-Prolog. All data is stored in an RDF database, which can be 
either an external SeRQL repository or an in-memory SWI-Prolog RDF database. 
The system consists of three components, RDF-interaction, which deals with RDF- 
database storage and access, HTML-code generation, for the creation of Web pages 
and the Web server component, implementing the HTTP server. They are discussed 
in the following sections. 

9.1.1 RDF-interaction 

We first describe the content of the RDF database before explaining how to access 
it. The RDF database contains: 

• 750 records from the Rijksmuseum, and 1000 from the Koninklijke Biblio- 
theek. 

• RDF representation of the hierarchically structured facets, we use SKOS^^, a 
model dedicated to the represention of controlled vocabularies. 



http : //www . w3 . org/2004/02/skos/ 
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SELECT Rec, RecTitle, RecThumb, CollSpec 
FROM -CSiteld} rdfs: label {"ARIATOPS-NDNE"} ; 

mfs : collection-spec {CollSpec} mfs: record-type {RT}; 

mf s : shorttitle-prop {TitleProp} ; 

mf s : thumbnail -prop -[ThxrmbProp}- , 
■CRec} rdf:type {RT>; TitleProp -[RecTitle}; 

ThumbProp {RecThumb} , 
{Rec} rdf:type {<http://www.telin.n1/rdf/topia#AnimalPieces>}, 
{Rec} rdfitype {<http://www.telin.n1/rdf/topia#Paintings>} 
USING NAMESPACE skos = <http://www.w3.Org/2004/02/skos/core#>, 
mfs = <http://www.cs.vii.n1/STITCH/pp/mf-schema#xbr> 

Fig. 23. An example of a SeRQL query, which returns details of records matching two 
facet values (AnimalPieces and Paintings) 



• Mappings between SKOS Concept Schemes used in the different collections. 

• Portal-specific information as "S'ite Configuration Objects^^ , identified by URIs 
with properties defining what collections are part of the setup, what facets 
are shown, and also values for the constant text in the Web page presen- 
tation and other User Interface configuration properties. Multiple such Site 
Configuration Objects may be defined in a repository. 

The in-memory RDF store contains, depending on the number of mappings and 
structured vocabularies that are stored in the database, about 300,000 RDF triples. 
The Sesame store contains more triples - 520,000 - as its RDFS-entailment imple- 
mentation implies generation of derived triples (see section 7). 

RDF database access Querying the RDF store for more complex results based on 
URL query arguments consisted of three steps: 1) building SeRQL queries from 
URL query arguments, 2) passing them on to the SeRQL-engine, gathering the 
result rows and 3) finally post-processing the output, e.g. counting elements and 
sorting them. Figure 23 shows an example of a generated SeRQL query. Find- 
ing matching records involves finding records annotated by the facet value or by 
a value that is a hierarchical descendant of facet value. We implemented this by 
interpreting records as instances of SKOS concepts and using the transitive and 
reflexive properties of the rdfs:subClassOf property. This explains for example 
{Rec} rdf :type {<http://www.telin.n1/rdf/topia#Paintings>} in figure 23. 

The SeRQL-query interface contains timing and debugging facilities for single 
queries; for fiexibility it provides access to an external SeRQL server^® for which 
we used Sesame^'', but also to the in-memory store of the SWI-Prolog SeRQL 
implementation described in section 7. 



We used the sesame_client.pl library that provides an interface to external SeRQL servers, 
packaged with the SWI-Prolog SeRQL library 
http : //www . openrdf . org 
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objectstr([] ,_0, _Cols,_Args) — > [] . 

objectstrC [RowObjects I ObjectsList] , Offset, Cols, Args) — > 

{ Percentage is 100/Cols }, 

html(tr(valigii(top) ,\objectstcl(RowObjects, Offset, Percentage, Args))), 

{ Offsetl is Offset + Cols }, 

objectstr (ObjectsList, Offsetl, Cols, Args). 

objectstdC], _, _, _) — > [] . 

objectstdC [UrllRowObjects] , Index, Percentage, Args) — > 
{ .. 

construct_href _index( . . . , HRef ) , 
missing_picture_txt (Url , MP) 

>. 

html(td(width (Percentage) ,a( [href (HRef )] ,img([src(Url) ,alt(MP)] )))) , 
{ Indexl is Index + 1 >, 

objectstd(RowObjects, Indexl, Percentage, Args). 

Fig. 24. Part of the html code generation for displaying all the images of a query result 

in an HTML table 



9.1.2 HTML-code generation 

We used the SWI-Prolog html_write.pl library described in section 3.1 for our 
HTML-code generation. There are three distinct kinds of Web pages the multi- 
faceted browser generates, the portal access page, the refinement page and the single 
collection-item page. The DCG approach to generating HTML code made it easy 
to share HTML-code generating procedures such as common headers and HTML 
code for refinement of choices. The HTML-code generation component contains 
some 140 DCG rules (1200 lines of Prolog code of which 800 lines are DCG rules), 
part of which are simple list-traversing rules such as the example of Figure 24. 

9.1.3 Web Server 

The Web server is implemented using the HTTP server library described in sec- 
tion 5.2. The Web server component itself is very small. It follows the skeleton code 
described in Figure 13. In our case the repIy/1 predicate extracts the URL root 
and parameters from the URL. The Site Configuration Object, which is introduced 
in section 9.1.1, is returned by the RDF-interaction component based on the URL 
root. It is passed on to the HTML-code generation component which generates Web 
content as shown in reply _page/l in Figure 3. 

9.2 Evaluation 

This case study shows that SWI-Prolog is effective for building applications in 

the context of the Semantic Web. In a single month a fully functional prototype 
portal has been created providing structured access to multiple collections. The 
independence of any external libraries and the full support of all libraries on different 
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platforms made it easy to develop and install in different operating systems. All 
case study software has been tested to install and run transparently both on Linux 
and on Microsoft Windows. 

At the start of the pilot project we briefly evaluated existing environments for 
creating multi-faceted browsing portals: We considered the software available from 
the Flamenco Project (Hearst et al. 2002) and the OntoViews Semantic Portal Cre- 
ation Tool (Makela et al. 2004). The Flamenco software would require developing a 
translation from RDF to the Flamenco data representation. OntoViews heavily uses 
Semantic Web techniques, but the software was unnecessarily complex for our pilot, 
requiring a number of external libraries. This together with our need for flexible 
experiments with various setups made us decide to build our own prototype. 

The prototype allowed us to easily experiment with and develop various inter- 
esting ways of providing users access to integrated heterogeneous collections (van 
Gcndt ct al. 2006). The pilot project portal is accessible on line for your evaluation 
at http : //stitch . cs . vu . nl/demo . html. 

10 Conclusion 

We have presented an overview of the libraries and Prolog language extensions we 
have implemented and which we provide to the LP community as Open Source 
resources. As the presented libraries cover very different functionality we have com- 
pared our approach with related work throughout the document. We have demon- 
strated that Prolog, equipped with an HTTP server library, libraries for reading and 
writing markup documents, multi-threading, Unicode support, unbounded atoms 
and atom garbage collection, becomes a flexible component in a multi-tier server ar- 
chitecture. Automatic memory management and the absence of pointers and other 
dangerous language constructs justify the use of Prolog in security sensitive envi- 
ronments. The middleware, typically dealing with the application logic is the most 
obvious tier for exploiting Prolog. In our case studies we have seen Prolog active as 
storage component (section 7), middleware (section 8) and in the presentation tier 
(section 9). 

Development in the near future is expected to concentrate on Semantic Web rea- 
soning, such as the translation of SWRL rules to logic programs. Such translations 
will benefit from tabling to reach at more predictable response-times and allow 
for more declarative programs. We plan to add more support for OWL reasoning, 
possibly supporting vital relations for ontology mapping such as owl : samieAs in 
the low-level store. We also plan to add PSP/PWP-like (section 3) page-generation 
facilities. 
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